fix: fall back to CPU when CUDA is requested but unavailable (#216) by ousamabenyounes · Pull Request #244 · microsoft/LLMLingua

ousamabenyounes · 2026-04-11T18:34:07Z

What does this PR do?

Fixes #216

PromptCompressor defaulted to device_map=\"cuda\". On a CPU-only machine (e.g. Windows with torch --index-url .../whl/cpu), instantiating the compressor immediately fails with AssertionError: Torch not compiled with CUDA enabled, forcing every user to pass device_map=\"cpu\" manually and making the copy-paste examples in the README unusable out of the box.

This PR makes load_model check torch.cuda.is_available() when a CUDA device map is requested. If CUDA isn't available, it emits a RuntimeWarning and falls back to \"cpu\" transparently, so existing code with the default device_map=\"cuda\" keeps working on GPU machines and no longer crashes on CPU-only machines.

Changes

File	Change
`llmlingua/prompt_compressor.py`	Guard `device_map=\"cuda\"` with `torch.cuda.is_available()` and fall back to `\"cpu\"` with a `RuntimeWarning`. Also adds a `warnings` import.
`tests/test_issue_216.py`	New regression tests (3) using monkeypatching on `torch.cuda.is_available` and `AutoConfig` / `AutoTokenizer` / `AutoModel*` — they run in ~4s and require no network or model download.

Behavior matrix

`device_map`	`torch.cuda.is_available()`	Before	After
`\"cuda\"` (default)	`True`	runs on CUDA	runs on CUDA (unchanged)
`\"cuda\"` (default)	`False`	\u274c `AssertionError`	runs on CPU + `RuntimeWarning`
`\"cpu\"` explicit	`False`	runs on CPU	runs on CPU (unchanged)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Was this discussed/approved via a Github issue? Please add a link to it if that's the case — [BUG] LLMLingua fails on Windows with PyTorch CPU: "Torch not compiled with CUDA enabled" #216.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests? — see `tests/test_issue_216.py` (3 new tests, all passing, ~4s).

Verification

Baseline: 2 tests pass, 3 tests fail with a pre-existing `ValueError: too many values to unpack (expected 2)` in `iterative_compress_prompt` (transformers DynamicCache API change — unrelated to this PR).
Post-fix: same 2 baseline tests still pass, 3 new tests pass, same 3 pre-existing failures — zero regressions.

Generated by Claude Code
Vibe coded by ousamabenyounes

…ft#216) PromptCompressor defaulted to device_map="cuda" even when torch was built without CUDA support, producing "AssertionError: Torch not compiled with CUDA enabled" on Windows / CPU-only machines. The fix transparently falls back to "cpu" (with a RuntimeWarning) when "cuda" is requested but torch.cuda.is_available() is False, so the default still works out of the box on CPU-only installs. Generated by Claude Code Vibe coded by ousamabenyounes Co-Authored-By: Claude <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fall back to CPU when CUDA is requested but unavailable (#216)#244

fix: fall back to CPU when CUDA is requested but unavailable (#216)#244
ousamabenyounes wants to merge 1 commit intomicrosoft:mainfrom
ousamabenyounes:fix/issue-216

ousamabenyounes commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ousamabenyounes commented Apr 11, 2026

What does this PR do?

Changes

Behavior matrix

Before submitting

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant